Skip to main content

Using pdf2Data CLI engine with PHP

The pdf2Data Parsing Engine is available as a REST API service and CLI, in addition to native libraries for Java and the .NET framework. Both the REST API service and CLI provide a way to extract data from PDF files using any programming language. While we recommend using the REST API parsing engine whenever possible, there are situations where you may prefer to extract data using CLI, due to security reasons in the application architecture for instance.

Prerequisites

To successfully follow this manual please make sure that you have:

  1. An extraction template created using pdf2Data Editor.
  2. Installed the CLI Parsing engine.

In this article, you can find code to use pdf2Data CLI Engine from PHP.

PHP is a widely-used general-purpose scripting language, it allows you to run an external program using the exec function, making it straightforward to call the pdf2Data CLI. The examples below demonstrate how to execute the pdf2Data CLI using the PHP exec function.

First, we need to process the template.p2dta containing your defined extraction rules.

<?php
/* Generate an xml template file from PDF template file template.pdf */
exec('java -jar cli.jar preprocess -s template.p2dta -d template.p2d');
?>

Then you can extract data from the PDF file file_for_parsing.pdf with the help of the template you generated above. The extracted data can be saved as XML.

<?php
/* Extract data from PDF file input.pdf and save it into output.xml */
exec('java -jar cli.jar parse -t template.p2d -s file_for_parsing.pdf -p recognized.pdf -x recognized.xml -l license.json');
?>

or as JSON

<?php
/* Extract data from PDF file input.pdf and save it into output.xml */
exec('java -jar cli.jar parse -t template.p2d -s file_for_parsing.pdf -p recognized.pdf -j recognized.json -l license.json');
?>